Chinese-English Parallel Corpus Construction and its Application
نویسنده
چکیده
Chinese-English parallel corpora are key resources for Chinese-English cross-language information processing, Chinese-English bilingual lexicography, Chinese-English language research and teaching. But so far large-scale Chinese-English corpus is still unavailable yet, given the difficulties and the intensive labours required. In this paper, our work towards building a large-scale Chinese-English parallel corpus is presented. We elaborate on the collection, annotation and mark-up of the parallel Chinese-English texts and the workflow that we used to construct the corpus. In addition, we also present our work toward building tools for constructing and using the corpus easily for different purposes. Among these tools, a parallel concordance tool developed by us is examined in detail. Several applications of the corpus being conducted are also introduced briefly in the paper.
منابع مشابه
Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction
This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually trans...
متن کاملMining Large-scale Parallel Corpora from Multilingual Patents: An English-Chinese example and its application to SMT
In this paper, we demonstrate how to mine large-scale parallel corpora with multilingual patents, which have not been thoroughly explored before. We show how a large-scale English-Chinese parallel corpus containing over 14 million sentence pairs with only 1-5% wrong can be mined from a large amount of English-Chinese bilingual patents. To our knowledge, this is the largest single parallel corpu...
متن کاملThe Construction of a Chinese-English Patent Parallel Corpus
In this paper, we describe the construction of a parallel Chinese-English patent sentence corpus which is created from noisy parallel patents. First, we use a publicly available sentence aligner to find parallel sentence candidates in the noisy parallel data. Then we compare and evaluate three individual measures and different ensemble techniques to sort the parallel sentence candidates accordi...
متن کاملExploring Parallel Concordancing in English and Chinese
This paper investigates the value of computer technology as a medium for the delivery of parallel texts in English and Chinese for language learning. An English-Chinese parallel corpus was created for use in parallel concordancing -a technique which has been developed to respond to the desire to study language in its natural contexts of use. Specific problems of dealing with Chinese characters ...
متن کاملContrastive connectors in English and Chinese: A case
This comparative study of however and its Chinese counterparts in two translation corpora (the HLM parallel corpus, and the Babel English-Chinese Parallel Corpus) reveals that the Chinese contrastive relations tend to be expressed implicitly (cf. Wang and Zheng 2004) and Chinese contrastive connectors are generally used in sentence initial position, whereas the English contrastive relations ten...
متن کامل